﻿ Summarisation through Discourse Structure ,21,31 Dan Cristea1, Oana Postolache,andIonut¸Pistol 1 “Al I Cuza” University, Ia¸si, Romania pic@infoiasi ro 2 Centro per la Ricerca Scientiﬁca e Tecnologia, Instituto Trentino di Cultura, Trento, Italy cristea@itc it 3 Computational Linguistics, Saarland University, Saarbr¨ucken, Germany oana@coli uni-sb de Abstract In this paper we describe a method to obtain summaries fo- cussed on chosen characters of a free text Summaries are extracted from discourse structures, which resemble rhetorical trees They are obtained by exploiting cohesion and coherence properties of the text Evaluation intends to evidence the contribution of each module in the ﬁnal result 1 Introduction In this paper we describe an approach to discourse parsing and summarisa- tion that exploits cohesion and coherence properties of texts We built discourse structures that resemble the RST (Rhetorical Structure Theory ) trees, al- though ours are binary and lack relation names The output of the parsing process is used to obtain excerpt-type summaries focussed on individual charac- ters mentioned in the text A combined, pipe-line/parallel/incremental, type of processing is employed The involved modules are POS-tagging, FDG-parsing, clause segmentation of sentences, construction of elementary discourse trees, de- tection of noun phrases (NPs), anaphora resolution (AR), discourse parsing and summarisation To master the combinatorial explosion yield by diﬀerent sources of ambiguity, a beam-search based processing is employed We present the archi- tecture of a discourse parsing system and discuss the evaluation methodology The ﬁnal evaluation is realised by comparing the summaries outputted by the system against those collected from human subjects Section 2 presents the overall method and the architecture of the system Section 3 gives a quick overview on veins theory, which stays at the basis of the focussed summarisation method Section 4 presents the method of incremental parsing and the module that assembles elementary discourse trees correspond- ing to sentences Section 5 describes how the exponential explosion induced by diﬀerent sources of ambiguity is controlled In section 6 the corpus, and the eval- uation method are presented and section 7 discusses the results and synthesises some conclusions, limitations, and further work 2TheMethod We callfocussed summary on a character/entity X, a coherent excerpt present- ing how X is involved in the story that constitutes the content of the text Such summaries are of importance in information retrieval tasks from news or scientiﬁc papers when mentions of a certain entity are traced in a document Note that a generic summary of a discourse sometimes will not include a desired character/entity if this entity appears only collaterally in the given discourse Suppose, for instance, a drugs company interested to track in medical journals or scientiﬁc papers all mentions of a certain drug manufactured by them; neither appropriate extraction of the contexts of the drug mentions in the articles, nor generic summaries of the articles can be of help, as the intention is to know how is the drug mentionedwithin the general topics of the articles We describe the architecture of a system that combines a pipe-line style of processing the text with a parallel and an incremental one, with the aim to obtain an RST-like discourse structure that marks the topology and nuclearity, while ignoring the names of the rhetorical relations Such trees are then used to compute focussed summaries on certain discourse entities In the process of building discourse trees, we consider properties of the relationship between reference chains and the discourse structure (a manifestation of cohesion) on one hand and, on the other hand,between reference chains and the smoothness of centering transitions (a manifestation of coherence) Both reference chains and centering transitions are related with veins expressions computed following the veins theory (VT) First, the text is POS-tagged, then a syntactic parser (FDG) is run over it Further, the process is split into two ﬂow: one that segments the sentences intoelementary discourse units (edus) and then constructselementary discourse trees (edts) of each sentence, and another one that detects NPs and then runs an AR-engine to detect coreferential relations Intermediate ﬁles in the process- ing ﬂow are in the XML format When two processes join, the resulted ﬁles are merged into a single representation Anedt is a discourse tree whose leaf-nodes are theedus of one sentence Sentence-internal cue-words/phrases trigger the constituency of syntacticallyedts from each sentence , Since usually, from a given sentence, more than one such tree can be drawn, for each sentence in the original text a set ofedts is obtained At this point a process that simulates the human power of incremental discourse processing is started At any moment in the developing process, say aftern steps corresponding to the ﬁrstn sentences, a forest of trees is kept, representing the most promising structures built by com- bining in all possible ways alledtsofalln sentences Each such tree corresponds to one possible interpretation of the text processed so far Then, at stepn+1 of the incremental discourse parsing, the following operations are undertaken: ﬁrst, alledts corresponding to the next sentence are integrated in all possible ways onto all the trees of the existing forest; then the resulted trees are scored according to four independent criteria, sorted and ﬁltered so that only a fraction of them is retained (again the most promising aftern+1 steps) From the ﬁnal wave of trees, obtained after the last step, the highly scored is considered to be the discourse structure Summaries are computed on this tree In a general framework to resolve anaphors is proposed We use this framework to integrate a model of coreference resolution that deals with most types of anaphors Centering transitions scores are computed after AR is run, therefore after all references are solved References and transitions, as well as heuristics for the proper development of a discourse tree, contribute with scores to the overall score of a developing discourse tree These scores are then used to control the beam-search 3 Veins Theory and Focussed Summarisation Veins theory (VT) is used in the described process to guide the incremental tree building and to synthesize summaries VT makes two claims: emphasizes the close relationship between discourse structure and referentiality, as an expres- sion of text cohesion, and generalizes Centering Theory (CT) to the global discourse, as an expression of text coherence Moreover, VT adds a view on sum- marization (consistent with ) and naturally reveals how focused summaries can be produced The fundamental intuition underlying an integrated account on discourse structure and accessibility in VT is that the RST-speciﬁc distinction between nuclei and satellites limits the range of referents to which anaphors can be re- solved; in other words, the nucleus-satellite distinction, superimposed over a tree-like structure of discourse, induces adomain of evocative accessibility (dea) for each anaphor More precisely, for each anaphorx in a discourse unitu,VT hypothesizes thatx can be resolved by examining discourse entities from a sub- set of the discourse units that precedeu In this way VT reveals a “hidden” structure in the discourse tree, calledvein The notion of vein synthesizes ob- servations on how references interact with the discourse structure represented as an RST tree in which names of relations were ignored (we will call such a simpliﬁed representation an RST-like tree) Considering the hierarchical organi- zation given by the tree structure and the principle of compositionality , which induces recursively long-distance relations betweenedus, these observations can be stated as follows : – a right satellite or a nucleus can refer its left nuclear sibling; – a right nucleus can refer its left satellite; s1s2,withs1ands2satellites of the nucleusn1,orn1 – in a combinationn1 s1n2,withs1a satellite of the nucleusn1andn2a right nuclear sibling of n1 ,s1is not accessible from neithers2,norn2; – a nucleus blocks the reference from a right satellite to a left satellite The vein expression of anedu uis a list ofedus of the discourse, includingu, which is meant to express the sequence of units that are signiﬁcant to understand u in the context of the whole discourse VT classiﬁes references into three categories, in accordance with the way they align along the veins An anaphor, belonging to anedu u2 , is said to issue adirect reference, if its most linearly recent antecedent belongs to anedu u1 that is ’s vein Under the same notations, it issues anindirect reference included inu2 ifu1 does not belong tou2’s vein, but there is a more distant antecedent, say ,andu0is placed onu2’s vein If the backward-looking belonging to anedu u0 reference chain of the anaphor does not intersect the vein of the anaphor’sedu, we have aninferential reference VT conjunctures on two types of anaphoric processes:evocative (orim- mediate)andpost-evocative (orinferential) The evocative processes are most frequent, are rapid and can be realised by any referential means, including those as fragile as empty pronouns They make the discourse ﬂuid and increase the text cohesion The post-evocative anaphorae are less frequent, induce more inferential load on the reader (hearer) and make use of strong referential means (like proper nouns, for instance) The vein of anedu also gives a summary of the whole discourse, focused on that particular unit Now, suppose one discourse entity is traced and a summary focused on that entity is desired If there is only oneedu in which the entity is mentioned, the vein expression of thatedu gives a very well-focused summary of the entity A problem appears if the entity is mentioned in more than just one edu Because there is no a-priory reason to prefer one of the focused summaries obtained in this way to any of the others, it is clear that a combination of the vein expressions of eacheduin which the entity is mentioned should be considered We have tested more methods of building a ﬁnal summary from the collection of particular summaries The ﬁrst method takes the vein expression of the lowest node of the tree that covers all units in which the entity is mentioned Since the length of a vein expression is proportional to the deepness of the node in the tree structure, this method results in shorter summaries The second method considers that particular summary (vein expression) which sums most of the mentions of the entity The third method simply takes the union of all vein expressions of the units that mention the central entity Finally, the fourth method builds a histogram from all vein expressions of the units mentioning the central entity and selects all units above a certain threshold The last two methods are not in themselves vein expressions, and therefore are more prone to incoherent summaries than the ﬁrst two methods, the last one being the most exposed 4 Incremental Parsing The basic step in an incremental discourse parser is the integration of an el- ementary tree (edt), which corresponds to a sentence, in the tree representing the discourse structure of the discourse parsed so far Out of the two operations applied at each step during an incremental processing, described in , we have considered in the present implementation onlyadjunction Cue-words and cue- phrases are connectives having a signalling function on: the nuclearity of the edus they interconnect, the form of theedtthey belong to, and the place on the right frontier of the developing tree where anedtis to be adjoined Sub- ordinate connectives, like justas, although,as long as, whenever, because,etc , link subordinate clauses (satellite structures) onto regent clauses (nuclear struc- tures), while coordinate connectives, likeand, or, etc , usually link sibling nuclear structures There are also frequent cases when connectives miss completely In conformity also with , we have considered that eachedtcovers exactly the boundaries of one sentence Diﬀerent patterns of arguments for cue-words have been manually selected from a corpus Fig 1 depicts some cases (the dots suggest the nuclearities of their arguments) There are frequent cases when the same marker has more than one argument pattern Fig 1 Arguments patterns of cue-phrases As constraints to build syntactically correct trees we have used the rules de- scribed in Such constraints conﬁgureedts in which inner nodes are labelled with markers and leaf-nodes withedu labels Each node of the tree is also marked by a nuclearity function withn (for nuclear) ors (for satellite) so that at each level, between the two descendents of an inner node, at least one is markedn,and the root of anedtis always markeds Since the number of inner nodes of a bi- nary tree witht leaf-nodes ist-1, for anedt to be completely determined it needs a number of cue-words, as inneredtnodes, with one less than the number of edus For such reasons we apply heuristics to add dummy cue-words where miss- ing Dummy cue-words are empty strings similar toandwith both arguments labelled as nuclear (the implicit assumption is that a satellite is announced by a cue-word) The incremental parsing in is deterministic Heuristics help, at each step, to adjoin the currentedtin that place of the right frontier of the developing tree which maximizes the chances to arrive at a correct ﬁnal analy- sis Instead, our analysis does not go deterministically At each step, all possible trees resulted from the application of cue-word argument-structure patterns and syntactic constraints are generated and then are adjoined in all possible positions of the right frontier of the developing tree To control the exponential explosion induced by this luxurious behaviour we implemented a beam-search-like process 5 The Beam-Search Control Any beam-search-like process depends heavily on a scoring function able to ap- preciate the relevance of the objects produced at intermediate steps, and which are successively detailed or improved until a ﬁnal object, supposed to satisfy the goal, is obtained In this section we explain our scoring criteria In an empir- ical evaluation of VT’s conjectures is described Experiments drawn on corpora annotated to both discourse structure (RST) and coreference have shown that VT’s conjectures are generally correct The authors of VT report that 87 1% of all references they found in the investigated corpus are direct, and 8 5% are indirect The rest of 4 4% escape the predictions of VT, some being classiﬁed as of a pragmatic type (not needing an antecedent in order to be understood) In another article , the ﬁgures reported for references disobeying VT’s predictions are greater: 12 3% However, an important aspect is that exceptions align their frequencies per types with their evoking power, as shown in Table 1: type of referencepercent disobeying VT pragmatic56 3% proper nouns22 7% common nouns16 0% pronouns5 0% Table 1 Exceptions from VT’s cohesion conjecture, by type of reference Following , theevoking power of each of these types of REs decreases as we move down the list Pragmatic references are those which refer to entities that can be assumed as part of general knowledge, such asthe Senate or our in the phraseour streets The order in the table suggests that pragmatic references are easily understood without an antecedent while proper nouns and common noun phrases are understood less and less At the other extreme, pronouns have very poor evoking power: a message emitter employs them only when s/he is certain that the structure of the discourse allows for an easy recuperation of the antecedent in the message receiver’s memory Except for the cases where a pronoun can be understood without an antecedent (as in the example with our in our streets), the use of a pronoun referring an antecedent that is outside thedeashould produce an invalid message Since the detection of pragmatic references requires knowledge that goes beyond the possibilities of our sources, we considered only the last three types of anaphors in Table 1 for the scoring criterion based on references To scorereferences in relation with veins we have given the values 2, 1 and 0 for the valuesdirect, indirect andoutside vein, respectively Then, to score theanaphor type we have given the values 3, 2 and 1 for the following categories of anaphors:pronoun,common noun andproper noun, respectively Then we have multiplied these scores for each anaphor, allowing each anaphor to contribute to the general score of the tree with a value between 0 and 6, with 0 meaning that any of its antecedents are outside thedeaof the unit of the anaphor, and 6 in case of a pronoun whose most recent antecedent is on thedea section of the score (see below) of the unit the anaphor belongs to This is thesr The second tree-scoring criterion used the coherence conjecture of VT Fol- lowing , we let each unit to contribute with a score between 0 and 4, depending on the type of centering transition between the current unit and the previous unit in the vein expression, in ascending order of smoothness:no Cb ,abrupt shift,smooth shift,retaining andcontinuing As will be shown below, the score formula is design to keep track of the relationship between references and structure This is the sectionscof the score (see below) The overall con- tribution in the score of a tree coming from VT represents the s1 section of the score formula, and has the following form: xu ss rc s =w 112 6+w4(1) u∈Dx∈RE u whereuis anedu, Drepresents the whole discourse,xis an anaphor,REu is the set of the anaphors belonging to unitu which have antecedents outside that unit,sx is the referential score contributed by the anaphorx andsuis the rc andw2sum-up centering score contributed by the unitu Thetwoweightsw1 to 1 and are iteratively computed to accommodate optimally the score scheme to the expected results1 During the experiments we have noticed a tendency of the parsing trees to be skewed downward and to the right (a tree with this particular shape corresponds to a discourse in which eachedu adds a detail to the preceding one, while a tree completely skewed upward and to the right corresponds roughly to a discourse in which eachedu adds a detail to the initialedu To balance this tendency we scored better an adjunction of anedt on the upper part of the right frontier of the developing tree than on the lower part The contribution of this criterion section in the score formula, see below represents thes2 Sections3 of the score formula below is thought to penalize too many nuclear nodes in the ﬁnal tree A tree that has only nuclear nodes is a ﬂat structure, but is the between the two daughters of a node at least one should be nuclear So,s3 fraction between the number of satellites and the total number of nodes of the tree - reﬂects the quality of theedtswhich Finally, the last section of the score -s4 are build from sentences Eachedtis compared against the structure returned by the FDG parser (only for English) with respect to the nuclearity of theedus (0 5) and the identity of the sibling node in the structure (0 5) and then we average the sum on the number ofedusinthesegment At each step of the search we have a numberNof developing trees and to each of them we adjoin in all possible ways all computededts The score of each *s2*s3*s4 new developing tree such obtained is calculated as the products1 Then we sort all these trees in the descending order of their scores and we retain for the next step again the ﬁrstNbest rated trees At the end of the run, the best scored ﬁnal tree is considered to represent the discourse structure 1 The careful reader will notice that the weights of this score scheme conﬁgure a kind ofdiscourse model, since the number of anaphors per units is not constant, making the two terms of the sum to contribute unequally to the overall score in diﬀerent discourses It remains to verify whether the two weights stabilise around some constant values over large corpora, in which case they would indicate the saliance of the two criteria 6 Corpus and Evaluation We have done parallel experiments on both Romanian and English As a test we have used a fragment summing up 812 words from G Orwell’s novel “1984” in the English version and 863 words in its Romanian equivalent We believe that the evaluation of a complex NLP system should follow a procedure that facilitates an easy inventory of the depreciation of performance along the processing chain This way, the identiﬁcation of critical points of the system is straightforward and repairing can be focussed towards the points of maximum trouble In this section we show how we use such a technology in order to evaluate our summarizer for both English and Romanian The over- all processing ﬂow of the system and the points where the “temperature” is measured are depicted in Fig 2 Early processing phases, as POS-tagging and FDG-parsing are considered included in the input in this scheme Processing modules are indicated in light grey rectangles, evaluation results in red squares (dark on a black-and-white image), and ﬁles - in round rectangles: those which are pure outputs of processing modules - in white, and those inﬂuenced in any way by the a gold-standard - in yellow (slightly shadowed) The names of the ﬁles indicate their origin, so, for instance np-seg-gold-ar-edt-tree-test is a ﬁle that records a gold-standard (gold) of manually annotated noun-phrases (np) andedus (seg), as well as the results (test) of runing the AR-module (ar), the edt-detector module (edt) and the discourse parser module (tree) Also, sum- gold and all-test are the two most distant ﬁnal ﬁles, recording respectively the gold-standard of summary and the output of a complete and pure (no human intervention) processing chain All initial gold standards, seg-gold, np- gold and np-ar-gold have been cre- ated by master students in Computational Linguistics, while the sum-gold ﬁle was build with the help of a class of 91 terminal year undergraduate students in Computer Science, during an NLP examination They received the initial text in whichedus were already marked and numbered and were asked to indicate 4 summaries by writing down sequences of discourse unit numbers: a general summary of the whole text of about 20% reduction rate and three summaries focussed on diﬀerent characters mentioned in the text (Winston’s mother, Win- ston’s sister andthe girl with black hair) For eachedu of the original text we counted the number of times thisedu was included in any students’ summaries As such, a histogram resulted, with the sequence ofedunumbers on the x-axis and the frequency of mentioning on the y-axis Then we considered a sliding horizontal threshold on this histogram, and accepted as belonging to the golden summary all units whose corresponding frequencies were above the threshold During tests we have established the threshold to a number of hits of 20, which resulted in a gold-summary of length 30edus Fig 2 shows the processing ﬂow and results for the implementation running English texts In the upper part of the diagram the evaluation points are meant to determine the behaviour of the segment-detector, the NP-detector and the AR-engine, independent of the overall summarization task of the system In the ﬁgure, P stands for precision, R for recall and SR for success rate, conforming Fig 2 Processing and evaluation points to Precision and recall in the case of segment-detector have been computed in terms of segment borders, while success rate as the number of words correctly assigned to segments (belonging toedus around the same main verb), divided by the total number of words As Fig 2 shows we do not have a gold standard for discourse structure (a ﬁle tree- gold is absent) The diﬃculty to build reliable RST-like annotations over corpora was stressed repeatedly in the literature To evaluate our trees we used instead summaries, which are much easier to acquire If summaries extracted automatically, as by-products of a discourse parsing process, resemble those in- dicated by human subjects, then we should have a high degree of conﬁdence that the structures themselves reﬂect with enough accuracy the content of the text As baseline for our general summaries evaluation we have used the summary produced by MS Word on the same text As baseline for the three focussed summaries we selected all sentences containing the expressionshis mother, his sister andgirl 7 Discussions and Conclusion As seen in Fig 2 the segment-detector behaves satisfactory A less good preci- sion but very good recall was obtained also for the NP detector A signiﬁcant deterioration of the results are expected to occur following the AR-phase since the extreme extravagance of a free text as Orwell’s novel and the need to trace at once all types of anaphors made resolution of the coreferring anaphora a very diﬃcult task Comparing the two SR values (0 65 versus 0 6) one can perceive the inﬂuence of the NP-detector on the deterioration of the performance of the AR-engine This behaviour is conformant to the expectations since NPs are the referential expressions that are worked out by the AR-engine Theedts-extractor computededtsasshowninTable2 No ofedusNo of sentencesNo of generated of this lengthedts per sentence 1-3251-4 4-595-28 6142 Table 2 Statistics of the edt-extractor We tested our discourse parser (D-parser in Fig 2) over the set of 83eduswhich were grouped in 35 sentences in both seg-gold and seg-test To master the tree explosion we have used a threshold policy: after each step of the D-parser we kept only the most promising trees whose combined scores range in a threshold of zero under the best score (tie-vote on the maximum) Using this policy, the maximum number of trees generated in any of the 35 steps was 320 The implementation was done in Java To learn the optimum weight values andw2of formula (1) we have run 10 times the whole parser of parametersw1 by 0 1 (remember thatw2=1-w1) The ﬁnal results modifying at each stepw1 are shown in Fig 2 For comparison, the MS Word-baseline was rated with a recall of 0 18 Also, the best student summary was rated 0 59 As seen, the results are above the baseline, although the R values are still low The interesting thing is that our values are not much smaller than that of the best human-produced summary Also, the comparison points, as displayed in Fig 2, validate the expectations: the more gold components we incorporate, the more accurate are the results We could estimate the impact of the component modules on the summaries by counting the diﬀerences between R-values at the edges of the thick red double arrows: NP-detector + segment-detector, as the diﬀerence between np-seg and overall values = 0 06;edts-detector, as the diﬀerence between edt and overall values = 0 04, and AR-engine, as the diﬀerence between AR2 and AR1 values = 0,01 So, it seems that low level processes, as detection of NPs and segmentation inﬂuence more the summarization results that high level processes asedt-detection and AR resolution The results on Romanian are still under development, but we expect to be under the ones for English because of the lack of an FDG parser The following aspects will make the subject of further work: retraining of the process with diﬀerent heuristics, implementation of the substitution operation in incremental discourse parsing, and improvement of the performances of the modules, starting from the NP detector and the sentence segmentation 8Acknowledgements Our thanks go to our students who have done the manual anotations and have produced the summaries that helped to draw the ﬁnal evaluation Special thanks go to our colleagues from the Laboratory of Computational Linguistics of the University of Wolverhampton who have given us the FDG annotated version of the “1984” References 1 Mann, W C , Thompson, S A : Rhetorical structure theory: A theory of text or- ganization Text8:3 (1988) 243–281 2 Grosz, B J , Joshi, A K , Weinstein, S : Centering: A framework for modelling the local coherence of discourse Computational Linguistics (1995) 3 Cristea, D , Ide, N , Romary, L : Veins theory: A model of global discourse cohesion and coherence In: Proceedings of COLING/ACL, Montreal/Canada (1998) 4 Marcu, D : The Theory and Practice of Discourse Parsing and Summarization The MIT Press (2000) 5 Cristea, D , Postolache, O , Pu¸sca¸su, G , Ghetu, L : Local and global information exploited in producing summaries In: Proceedings of the International Sympo- sium on Reference Resolution and Its Applications to Question Answering and Summarisation, Venice/Italy (2003) 6 Cristea, D , Dima, G E : An integrating framework for anaphora resolution Infor- mation Science and Technology4 (2001) 273–291 7 Cristea, D : The relationship between discourse structure and referentiality in veins theory In Menzel, W , Vertan, C , eds : Natural Language Processing between Linguistic Inquiry and System Engineering Al I Cuza University Publishing House (2003) 8 Cristea, D , Webber, B : Expectations in incremental discourse processing In: Proceedings of ACL, Madrid/Spain (1997) 9 Knott, A : A Data-Driven Methodology for Motivating a Set of Coherence Rela- tions PhD thesis, Department of Artiﬁcial Intelligence, University of Edinburgh (1996) 10 S¸oricut¸, R , Marcu, D : Sentence level discourse parsing using syntactic and lexical information In: Proceedings of HLT-NAACL, Edmonton, Canada (2003) 11 Cristea, D , Ide, N , Marcu, D , Tablan, V : An empirical investigation of the relation between discourse structure and co-reference In: Proceedings of COLING, Saarb¨ucken/Germany (2000) 208–214 12 Ide, N , Cristea, D : A hierarchical account of referential accessibility In: Proceed- ings of ACL, Hong Kong (2000) 13 Gundel, J , Herberg, N , Zacharski, R : Cognitive status and the form of referring expressions in discourse Language69 (1993) 274–307 14 Mitkov, R : Anaphora Resolution Longman, London (2002)